Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch missed tick behavior to delay instead of default burst #108

Conversation

domZippilli
Copy link
Contributor

We encountered an interesting bug that we fixed and felt we should upstream, as it presents itself in the sample as well.

We observed that seemingly randomly, this loop (or rather, our similar but modified one) would get "hot" and run as fast as possible. This would consume a lot of CPU and other resources, and of course meant a lot of SYN packets being sent somewhere proximal to the peer. These "reconnect storms" could last as little as a few seconds, or up to hours, after which they would spontaneously resolve.

We could observe this because we'd added logging to the reconnect attempts, so all we knew was that some loop was hot, but not which of them. We initially tried rewriting the loop to use combinators with take and maximum attempt values, but this did not fix the problem. This suggested that the outer loop was the issue, but it took us some time to figure out how that could be.

The interval here uses the default missed tick behavior, which is Burst:

https://docs.rs/tokio/latest/tokio/time/enum.MissedTickBehavior.html

We believe that with a large number of peers offline (we sometimes had >5), this loop could run for longer than the Interval duration of one second. When this happened, the Interval would enter Burst mode and try to re-establish an exact cadence of running this loop on a whole multiple of the interval from the start.

Expected ticks: |     1     |     2     |     3     |     4     |     5     |     6     |
Actual ticks:   | work -----|          delay          | work | work | work -| work -----|

We switched the interval to instead use the Delay behavior, which should cause this loop to wait the interval's duration (1 second) each time tick() is called, which is probably what the author (@TheBlueMatt) intended.

Expected ticks: |     1     |     2     |     3     |     4     |     5     |     6     |
Actual ticks:   | work -----|          delay          | work -----| work -----| work -----|

Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, good catch.

@TheBlueMatt TheBlueMatt merged commit 4d3a5b0 into lightningdevkit:main Jul 12, 2023
@domZippilli
Copy link
Contributor Author

Not sure if there's a permissions thing but I can't see this question I got emailed with a link here:

@domZippilli Out of curiosity, was there a particular reason why you chose MissedTickBehavior::Delay over MissedTickBehavior::Skip?

Not particularly, either will get the job done of not-doing-burst and trying to do this on a specific cadence. I thought the Delay behavior is more like what most people would expect, in that writing a loop like this in a single-threaded program, most people would have a sleep at one end of the loop.

@tnull
Copy link
Contributor

tnull commented Jul 14, 2023

Not sure if there's a permissions thing but I can't see this question I got emailed with a link here:

No, excuse the confusion, after posting I quickly reconsidered and deleted the question as I figured the difference between Skip and Delay is really minuscule and it's not really worth your time explaining why you chose one over the other.

Not particularly, either will get the job done of not-doing-burst and trying to do this on a specific cadence. I thought the Delay behavior is more like what most people would expect, in that writing a loop like this in a single-threaded program, most people would have a sleep at one end of the loop.

Alright, thank you! That's what I figured, too. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants